Genetic Programming, Validation Sets, and Parsimony Pressure
نویسندگان
چکیده
Fitness functions based on test cases are very common in Genetic Programming (GP). This process can be assimilated to a learning task, with the inference of models from a limited number of samples. This paper is an investigation on two methods to improve generalization in GP-based learning: 1) the selection of the best-of-run individuals using a three data sets methodology, and 2) the application of parsimony pressure in order to reduce the complexity of the solutions. Results using GP in a binary classification setup show that while the accuracy on the test sets is preserved, with less variances compared to baseline results, the mean tree size obtained with the tested methods is significantly reduced. This paper is an experimental study of methodologies for Evolutionary Computations (EC) inspired by common practices in the Machine Learning (ML) and Pattern Recognition (PR) communities. More specifically, using Genetic Programming (GP) for supervised learning, we aim at evaluating both the effect of using a three data sets methodology (training, validation, and test sets) and the effect of minimizing the classifiers complexity. Our experiments show that these approaches preserve the performances of GP, while significantly reducing the size of the best-of-run solutions, which is in accordance with Occam’s Razor principle. The structure of the paper goes as follow. Section 1 starts with a high-level description of the tested approaches and their justifications. A presentation of relevant work follows in Section 2. Thereafter, the methodology used in the experiments is detailed in Section 3. Finally, Section 4 presents the experimental results obtained on six binary classification data sets, and Section 5 concludes the paper.
منابع مشابه
Effects of Code Growth and Parsimony Pressure on Populations in Genetic Programming
Parsimony pressure, the explicit penalization of larger programs, has been increasingly used as a means of controlling code growth in genetic programming. However, in many cases parsimony pressure degrades the performance of the genetic program. In this paper we show that poor average results with parsimony pressure are a result of 'failed' populations that overshadow the results of populations...
متن کاملE ects of Code Growth and ParsimonyPressure on Populations in GeneticProgramming
Parsimony pressure, the explicit penalization of larger programs, has been increasingly used as a means of controlling code growth in genetic programming. However, in many cases parsimony pressure degrades the performance of the genetic program. In this paper we show that poor average results with parsimony pressure are a result of \failed" populations which overshadow the results of population...
متن کاملE ects of Code Growth and ParsimonyPressure on Populations in GeneticProgramming Terence
Parsimony pressure has been increasingly used as a means of controling code growth in genetic programming. However, several published papers have shown that in some cases its use can degrade the performance of the genetic program Koza, 1992, Nordin and Banzhaf, 1995]. In this paper we show that poor average results with parsimony pressure are a result of \failed" populations which overshadow th...
متن کاملLexicographic Parsimony Pressure
We introduce a technique called lexicographic parsimony pressure, for controlling the significant growth of genetic programming trees during the course of an evolutionary computation run. Lexicographic parsimony pressure modifies selection to prefer smaller trees only when fitnesses are equal (or equal in rank). This technique is simple to implement and is not affected by specific differences i...
متن کاملCovariant Parsimony Pressure for Genetic Programming
The parsimony pressure method is perhaps the simplest and most frequently used method to control bloat in genetic programming. In this paper we first reconsider the size evolution equation for genetic programming developed in [24] and rewrite it in a form that shows its direct relationship to Price’s theorem. We then use this new formulation to derive theoretical results that show how to practi...
متن کامل